1,912 research outputs found

    Marker-based filtering of bilingual phrase pairs for SMT

    Get PDF
    State-of-the-art statistical machine translation systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop words, where the deterioration in translation quality is unacceptably high

    Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

    Get PDF
    This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

    Using unsupervised corpus-based methods to build rule-based machine translation systems

    Get PDF
    Tesis doctoral en Informática realizada en la Universitat d’Alacant por Felipe Sánchez Martínez bajo la dirección de los doctores Juan Antonio Pérez Ortiz y Mikel L. Forcada. La defensa de la tesis tuvo lugar el 30 de junio de 2008 ante el tribunal formado por los doctores Rafael C. Carrasco (Univ. d’Alacant), Lluís Padró y Lluís Màrquez (Univ. Politècnica de Catalunya), Harold Somers (Univ. of Manchester) y Andy Way (Dublin City Univ.). La calificación obtenida fue Sobresaliente Cum Laude por unanimidad, con mención de Doctor Europeo.PhD thesis in Computer Engineering written by Felipe Sánchez-Martínez at Universitat d’Alacant under the joint supervision of Dr. Juan Antonio Pérez-Ortiz and Dr. Mikel L. Forcada. Author was examined on June 30th , 2008 by the committee formed by Dr. Rafael C. Carrasco (Univ. d’Alacant), Dr. Lluís Padró and Dr. Lluís Màrquez (Univ. Politècnica de Catalunya), Dr. Harold Somers (Univ. of Manchester) and Dr. Andy Way (Dublin City Univ.). The grade obtained was Sobresaliente Cum Laude (highest mark), with the European Doctor mention.Tesis financiada por el Ministerio de Educación y Ciencia y el Fondo Social Europeo a través de la ayuda a la investigación BES-2004-4711

    Motivos del creciente uso de traducción automática seguida de posedición

    Get PDF
    Este artículo aborda las causas que, en opinión del autor, han motivado la adopción creciente de sistemas de traducción automática (TA) para la producción de borradores para la posedición. Estas causas son principalmente cuatro: la mejora en las técnicas de TA, la mayor disponibilidad de recursos tales como software y datos, el cambio en las expectativas de los usuarios en cuanto a lo que se puede esperar o no de un sistema de TA, y por último la mayor integración de sistemas TA en entornos de ayuda a la traducción.Aquest article aborda les causes que, en opinió de l'autor, han motivat l'adopció crecient de sistemes de traducció automàtica (TA) per produir esborranys per a la postedició. Aquestes causes són principalment quatre: la millora en les tècniques de TA, la major disponibilitat de recursos com ara programari i dades, el canvi en les expectatives dels usuaris quant al que es pot esperar o no d'un sistema de TA, i finalment la major integració de sistemes TA en entorns d'ajuda a la traducció.This article discusses the causes which, in the author's opinion, have led to an increase of the adoption of machine translation (MT) to produce drafts for post-editing. There are four main causes for this: improvement of MT techniques, increased availability of resources such as software and data, a change in users' expectation about MT, i.e. what can and cannot be expected from an MT system, and better ways of integrating MT systems in compute-aided translation tools

    Evaluating the more suitable ISM frequency band for iot-based smart grids: a quantitative study of 915 MHz vs. 2400 MHz

    Get PDF
    IoT has begun to be employed pervasively in industrial environments and critical infrastructures thanks to its positive impact on performance and efficiency. Among these environments, the Smart Grid (SG) excels as the perfect host for this technology, mainly due to its potential to become the motor of the rest of electrically-dependent infrastructures. To make this SG-oriented IoT cost-effective, most deployments employ unlicensed ISM bands, specifically the 2400 MHz one, due to its extended communication bandwidth in comparison with lower bands. This band has been extensively used for years by Wireless Sensor Networks (WSN) and Mobile Ad-hoc Networks (MANET), from which the IoT technologically inherits. However, this work questions and evaluates the suitability of such a "default" communication band in SG environments, compared with the 915 MHz ISM band. A comprehensive quantitative comparison of these bands has been accomplished in terms of: power consumption, average network delay, and packet reception rate. To allow such a study, a dual-band propagation model specifically designed for the SG has been derived, tested, and incorporated into the well-known TOSSIM simulator. Simulation results reveal that only in the absence of other 2400 MHz interfering devices (such as WiFi or Bluetooth) or in small networks, is the 2400 MHz band the best option. In any other case, SG-oriented IoT quantitatively perform better if operating in the 915 MHz band.This research was supported by the MINECO/FEDER project grants TEC2013-47016-C2-2-R (COINS) and TEC2016-76465-C2-1-R (AIM). The authors would like to thank Juan Salvador Perez Madrid nd Domingo Meca (part of the Iberdrola staff) for the support provided during the realization of this work. Ruben M. Sandoval also thanks the Spanish MICINN for an FPU (REF FPU14/03424) pre-doctoral fellowship

    Tecnologías de la Traducción: Actividades opcionales

    Get PDF
    Actividades opcionales sobre tecnologías de la traducción

    Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation

    Get PDF
    We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This new strategy takes advantage of how the linguistic resources are used by the RBMT system to segment the source-language sentences to be translated, and overcomes the limitations of existing hybrid approaches that treat the RBMT systems as a black box. Experimental results confirm that our approach delivers translations of higher quality than existing ones, and that it is specially useful when the parallel corpus available for training the SMT system is small or when translating out-of-domain texts that are well covered by the RBMT dictionaries. A combination of this approach with a recently proposed unsupervised shallow-transfer rule inference algorithm results in a significantly greater translation quality than that of a baseline PBSMT; in this case, the only hand-crafted resource used are the dictionaries commonly used in RBMT. Moreover, the translation quality achieved by the hybrid system built with automatically inferred rules is similar to that obtained by those built with hand-crafted rules.Research funded by the Spanish Ministry of Economy and Competitiveness through projects TIN2009-14009-C02-01 and TIN2012-32615, by Generalitat Valenciana through grant ACIF 2010/174, and by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement PIAP-GA-2012-324414 (Abu-MaTran)

    A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

    Get PDF
    Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper, we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallow-transfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules. Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspired by the work of Sánchez-Martínez and Forcada (2009) in which alignment templates were also considered for shallow-transfer rule inference. However, our approach overcomes many relevant limitations of that work, principally those related to the inability to find the correct generalisation level for the alignment templates, and to select the subset of alignment templates that ensures an adequate segmentation of the input sentences by the rules eventually obtained. Unlike previous approaches in literature, our formalism does not require linguistic knowledge about the languages involved in the translation. Moreover, it is the first time that conflicts between rules are resolved by choosing the most appropriate ones according to a global minimisation function rather than proceeding in a pairwise greedy fashion. Experiments conducted using five different language pairs with the free/open-source rule-based MT platform Apertium show that translation quality significantly improves when compared to the method proposed by Sánchez-Martínez and Forcada (2009), and is close to that obtained using handcrafted rules. For some language pairs, our approach is even able to outperform them. Moreover, the resulting number of rules is considerably smaller, which eases human revision and maintenance.Research funded by Universitat d’Alacant through project GRE11-20, by the Spanish Ministry of Economy and Competitiveness through projects TIN2009-14009-C02-01 and TIN2012-32615, by Generalitat Valenciana through grant ACIF/2010/174, and by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement PIAP-GA-2012-324414 (Abu-MaTran)
    corecore